Analyzing Big EHR Data—Optimal Cox Regression Subsampling Procedure with Rare Events

نویسندگان

چکیده

Massive sized survival datasets are becoming increasingly prevalent with the development of healthcare industry. Such pose computational challenges unprecedented in traditional analysis use-cases. A popular way for coping massive is downsampling them to a more manageable size, such that resources can be afforded by researcher. Cox proportional hazards regression has remained one most statistical models data to-date. This work addresses settings right censored and possibly left truncated rare events, observed failure times constitute only small portion overall sample. We propose subsampling-based estimators approximate their full-data partial-likelihood-based counterparts, assigning optimal sampling probabilities observations, including all failures analysis. Asymptotic properties proposed established under suitable regularity conditions, simulation studies carried out evaluate finite sample performance estimators. further apply our procedure on UK-biobank colorectal cancer genetic environmental risk factors.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Sinusoidal Cox Regression—A Rare Cancer Example

Evidence of an association between survival time and date of birth would suggest an etiologic role for a seasonally variable environmental exposure occurring within a narrow perinatal time period. Risk factors that may exhibit seasonal epidemicity include diet, infectious agents, allergens, and antihistamine use. Typically data has been analyzed by simply categorizing births into months or seas...

متن کامل

Analyzing Big Data with the Hybrid Interval Regression Methods

Big data is a new trend at present, forcing the significant impacts on information technologies. In big data applications, one of the most concerned issues is dealing with large-scale data sets that often require computation resources provided by public cloud services. How to analyze big data efficiently becomes a big challenge. In this paper, we collaborate interval regression with the smooth ...

متن کامل

Logistic Regression for Extremely Rare Events

Objectives: The quantitative analysis of extremely rare events and factors in uencing these events poses some di culties. The objective of my paper is to evaluate logistic regression for events millions times more rare than non-events. Methods: Based on former theoretical and experimental results a simulation study is conducted. A specialized software is developed and supplied with this paper. ...

متن کامل

EHR Big Data Deep Phenotyping

Objectives: Given the quickening speed of discovery of variant disease drivers from combined patient genotype and phenotype data, the objective is to provide methodology using big data technology to support the definition of deep phenotypes in medical records. Methods: As the vast stores of genomic information increase with next generation sequencing, the importance of deep phenotyping increase...

متن کامل

Logistic Regression in Rare Events Data

We study rare events data, binary dependent variables with dozens to thousands of times fewer ones (events, such as wars, vetoes, cases of political activism, or epidemiological infections) than zeros (“nonevents”). In many literatures, these variables have proven difficult to explain and predict, a problem that seems to have at least two sources. First, popular statistical procedures, such as ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Journal of the American Statistical Association

سال: 2023

ISSN: ['0162-1459', '1537-274X', '2326-6228', '1522-5445']

DOI: https://doi.org/10.1080/01621459.2023.2209349